Management of cloud-based shared content using predictive cost modeling

ABSTRACT

Systems and methods for managing content in a cloud-based service platform. A method embodiment operates over storage content objects stored in storage devices in a cloud-based shared content management system. The method commences upon identifying a source object and identifying derivative objects that are generated based properties of the source object. After a time, candidate eviction objects are identified. One form of analysis is performed over source objects and another form of analysis is performed over derivative objects. Derivative objects are classified using the analysis, which classification is used to determine object management commands associated with the derivative object such as to remove the derivative object from one storage location (e.g., in a high-performance storage filer) and relocate it to another (e.g., lower cost) storage location. Based on the analysis, a derivative object might be deleted completely and then regenerated at a later time if/when needed.

FIELD

This disclosure relates to managing content in a cloud-based serviceplatform, and more particularly to techniques for management ofcloud-based shared content using predictive modeling.

BACKGROUND

Cloud-based content management services and platforms have impacted theway personal and corporate electronically stored information objects(e.g., files, images, videos, etc.) are stored, and has also impactedthe way such personal and corporate content is shared and managed. Onebenefit of using cloud-based platforms is the ability to securely sharecontent among trusted collaborators who access shared content from avariety of user devices such as mobile phones, tablets, laptopcomputers, desktop computers, and/or other devices from any geographiclocation. Such collaboration is facilitated by cloud-based contentmanagement service providers that manage access to various storagefacilities that store the content objects created and/or uploaded by thecollaborators.

Such storage facilities often comprise various types of storageappliances (e.g., network-attached storage (NAS) filers, distributedaccess and versioning (DAV) filers, cloud-accessible appliances, backuptape reader/writers, etc.) each of which have respective performancecharacteristics and associated costs. The aforementioned appliancesstore not only the source content objects provided by the collaborators,but they also store derivative content objects (e.g., derived objects)associated with the source content objects. For example, a PDF documentthat has been uploaded for sharing by multiple collaborators might havederived content objects (e.g., thumbnail images of each page) that havebeen generated for that document. Such derived content objects ofteninclude thumbnails, preview pages, index metadata, backup copies, and/orother derivative objects. In some cases, multiple duplicate copies(e.g., multiple backup copies) of a particular source content objectmight also be generated and stored (e.g., in accordance with a servicelevel agreement (SLA) and/or other data retention policy).

As the proliferation of cloud-based content management services andadoption of cloud-based collaboration rapidly increases, so does thegrowth in the number of source content objects and their associatedderivative content objects. As such, the storage capacity to store suchobjects also increases as do the capital and ongoing costs incurred topurchase and maintain the needed storage capacity.

Unfortunately, storage costs are often incurred to store content objectsthat are seldom accessed, or no longer being accessed, or even not everaccessed at all. Such content objects can be referred to as being “warm”or being “cold”, depending on the access pattern. Conversely, a contentobject that is frequently accessed can be referred to as being “hot”.For example, a file comprising a “daily tally” that is accessed bymembers of a corporate department might be referred to as being “hot”,whereas, for example, a file comprising “last year's roll-up” that mightno longer have relevance might be referred to as being “cold”. Asadditional examples, certain derivative content objects (thumbnails,etc.) that had been generated with respect to an associated sourcecontent object can become “cold” even though the source content objectmay remain “warm” or “hot”. In these cases, relatively higher capitalcosts and operating costs associated with storing cold content objectsin high performance appliances are incurred—even though there is no orlittle benefit to storing in such high performance appliances.

Some naïve approaches to storing cold content merely continue to deployadditional high performance appliances (e.g., additional filers, etc.)to accommodate the expanding corpus of source and derivative contentobjects. Such a naïve approach results in additional storage costspertaining to the content objects that are considered cold. What isneeded is a technological solution for managing storage capacity in thepresence of a highly dynamic corpus of source and derivative storageobjects in the context of a cloud-based content management environment.

What is needed is a technique or techniques to improve over legacytechniques and/or over other considered approaches. Some of theapproaches described in this background section are approaches thatcould be pursued, but not necessarily approaches that have beenpreviously conceived or pursued. Therefore, unless otherwise indicated,it should not be assumed that any of the approaches described in thissection qualify as prior art merely by virtue of their inclusion in thissection.

SUMMARY

The present disclosure provides a detailed description of techniquesused in systems, methods, and in computer program products formanagement of cloud-based shared content, which techniques advance therelevant technologies to address technological issues with legacyapproaches. More specifically, the present disclosure provides adetailed description of techniques used in systems, methods, and incomputer program products for managing cloud-based shared content usingpredictive modeling. Certain embodiments are directed to technologicalsolutions that apply a rule base and a predictive model to a set ofcontent objects so as to identify one or more content management actionsto be taken, such as deletion from one storage device (e.g., a highperformance storage filer) and movement to a different location (e.g.,onto lower-cost storage media).

The disclosed embodiments modify and improve over legacy approaches. Inparticular, the herein-disclosed techniques provide technical solutionsthat address the technical problems attendant to the ongoing problem ofmeeting high performance storage requirements in the presence of ahighly dynamic corpus of shared content objects in a cloud-based contentmanagement environment. Such technical solutions relate to improvementsin computer functionality. Various applications of the herein-disclosedimprovements in computer functionality serve to reduce the demand forcomputer memory, reduce the demand for computer processing power, reducenetwork bandwidth use, and reduce the demand for inter-componentcommunication. Some embodiments disclosed herein use techniques toimprove the functioning of multiple systems within the disclosedenvironments, and some embodiments advance peripheral technical fieldsas well. As one specific example, use of the disclosed techniques anddevices within the shown environments as depicted in the figures provideadvances in the technical field of high performance online collaborationsystems as well as advances in various technical fields related to datastorage.

Further details of aspects, objectives, and advantages of thetechnological embodiments are described herein and in the drawings andclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings described below are for illustration purposes only. Thedrawings are not intended to limit the scope of the present disclosure

FIG. 1A depicts usage of a cloud-based storage capacity reductiontechnique, according to an embodiment.

FIG. 1B depicts a content object processing technique as implemented insystems for dynamically managing cloud-based shared content usingpredictive modeling, according to an embodiment.

FIG. 1C depicts a cloud-based environment including a collaborativecloud-based shared content management platform that interacts withexternal storage facilities to facilitate dynamically managingcloud-based shared content using predictive modeling, according to someembodiments.

FIG. 2 presents a block diagram of a computing environment that supportsvarious techniques as used in systems for dynamically managingcloud-based shared content using predictive modeling, according to anembodiment.

FIG. 3 illustrates a candidate eviction object selection technique asimplemented in systems for dynamically managing cloud-based sharedcontent using predictive modeling, according to an embodiment.

FIG. 4 presents a candidate eviction object ranking technique asimplemented in systems for dynamically managing cloud-based sharedcontent using predictive modeling, according to an embodiment.

FIG. 5 depicts a content object removal technique as implemented insystems for dynamically managing cloud-based shared content usingpredictive modeling, according to an embodiment.

FIG. 6 depicts a content object restoration technique as implemented insystems for dynamically managing cloud-based shared content usingpredictive modeling, according to an embodiment.

FIG. 7 depicts system components as arrangements of computing modulesthat are interconnected so as to implement certain of theherein-disclosed embodiments.

FIG. 8A and FIG. 8B present block diagrams of computer systemarchitectures having components suitable for implementing embodiments ofthe present disclosure, and/or for use in the herein-describedenvironments.

DETAILED DESCRIPTION

Embodiments in accordance with the present disclosure address theproblem of managing highly-available storage appliances in the presenceof a highly dynamic corpus of shared content objects in a cloud-basedcontent management environment. The accompanying figures and discussionsherein present example environments, systems, methods, and computerprogram products.

Overview

Disclosed herein are techniques for applying a rule base and apredictive model to a set of content objects so as to identify one ormore object management commands that serve to achieve one or morequantitative objectives (e.g., such as reduction in aggregate costs)while still achieving a given quantitative performance level (e.g., suchas adhering to performance specifications of a service level agreement(SLA)). In certain embodiments, a set of eviction rules are applied tothe storage content objects (e.g., source objects, derivative objects,etc.) at each of a plurality of filer appliances to identify a set ofcandidate eviction objects and corresponding object management commands(e.g., a command to delete, a command to relocate, a command tocompress, etc.). A cost model is consulted with respect to eachcandidate eviction object to generate predicted costs, specifically: (1)a predicted retention cost (e.g., storage cost, computing cost,transmission cost, etc.) that pertains to processing the candidateeviction object at the filer appliance, and (2) a predicted disposalcost that pertains to removing the candidate eviction object from thefiler appliance.

As used herein, candidate eviction objects are objects (e.g., files,data, etc.) that are stored in or on a storage device. For managingongoing storage of objects, candidate eviction objects are ranked by acost reduction score that is assigned to each of the candidate evictionobjects based on the difference between the predicted retention cost(e.g., ongoing storage costs, etc.) and the predicted disposal cost(e.g., one-time removal cost, regeneration costs, etc.). For example, ifa predicted retention cost for an object is higher than the predicteddisposal (and possible regeneration) cost, then that object might becomea candidate for disposal.

When certain events or signals are detected (e.g., an event arising froma storage device capacity utilization threshold breach), a correspondingset of object management commands pertaining to some or all of thecandidate eviction objects are executed, possibly in a prescribedsequence or execution order corresponding to a ranking. In someembodiments, multiple object management commands for certain candidateeviction objects are issued to, for example, fortify the candidateeviction objects before removing the objects. As used herein, objectmanagement commands are computer instructions that are able to becarried out by a computer to cause one or more storage actions to beperformed on a storage object. Such object management commands can becarried out by a filer, or by any server or other compute-capableentity. Object management commands are often raised as a result ofstorage subsystem activities.

In some embodiments, a set of filer activity data (e.g., filer accesspatterns, miss history, etc.) is recorded to facilitate candidateeviction object identification and/or for scoring and/or for rankingand/or to prescribe a command execution order. In some embodiments,removed content objects are restored (e.g., reloaded, regenerated, etc.)to the filer appliance based on observed filer activity. In someembodiments, the object management command execution is halted whencertain conditions triggering the eviction event signal have changed(e.g., the capacity utilization threshold is no longer breached).

Definitions and Use of Figures

Some of the terms used in this description are defined below for easyreference. The presented terms and their respective definitions are notrigidly restricted to these definitions—a term may be further defined bythe term's use within this disclosure. The term “exemplary” is usedherein to mean serving as an example, instance, or illustration. Anyaspect or design described herein as “exemplary” is not necessarily tobe construed as preferred or advantageous over other aspects or designs.Rather, use of the word exemplary is intended to present concepts in aconcrete fashion. As used in this application and the appended claims,the term “or” is intended to mean an inclusive “or” rather than anexclusive “or”. That is, unless specified otherwise, or is clear fromthe context, “X employs A or B” is intended to mean any of the naturalinclusive permutations. That is, if X employs A, X employs B, or Xemploys both A and B, then “X employs A or B” is satisfied under any ofthe foregoing instances. As used herein, at least one of A or B means atleast one of A, or at least one of B, or at least one of both A and B.In other words, this phrase is disjunctive. The articles “a” and “an” asused in this application and the appended claims should generally beconstrued to mean “one or more” unless specified otherwise or is clearfrom the context to be directed to a singular form.

Various embodiments are described herein with reference to the figures.It should be noted that the figures are not necessarily drawn to scaleand that elements of similar structures or functions are sometimesrepresented by like reference characters throughout the figures. Itshould also be noted that the figures are only intended to facilitatethe description of the disclosed embodiments—they are not representativeof an exhaustive treatment of all possible embodiments, and they are notintended to impute any limitation as to the scope of the claims. Inaddition, an illustrated embodiment need not portray all aspects oradvantages of usage in any particular environment.

An aspect or an advantage described in conjunction with a particularembodiment is not necessarily limited to that embodiment and can bepracticed in any other embodiments even if not so illustrated.References throughout this specification to “some embodiments” or “otherembodiments” refer to a particular feature, structure, material orcharacteristic described in connection with the embodiments as beingincluded in at least one embodiment. Thus, the appearance of the phrases“in some embodiments” or “in other embodiments” in various placesthroughout this specification are not necessarily referring to the sameembodiment or embodiments. The disclosed embodiments are not intended tobe limiting of the claims.

Descriptions of Example Embodiments

FIG. 1A depicts usage of a cloud-based storage capacity reductiontechnique 1A00. As an option, one or more variations of cloud-basedstorage capacity reduction technique 1A00 or any aspect thereof may beimplemented in the context of the architecture and functionality of theembodiments described herein. The cloud-based storage capacity reductiontechnique 1A00 or any aspect thereof may be implemented in anyenvironment.

The cloud-based storage capacity reduction technique 1A00 presented inFIG. 1A illustrates one embodiment of techniques for dynamicallymanaging cloud-based shared content using a rule base and/or predictivemodeling. Such techniques described herein address the problemsattendant to minimizing highly-available storage requirements in thepresence of a highly dynamic corpus of shared content objects in acloud-based content management environment. In certain distributedcomputing environments, multiple instances of object filers (e.g., filerappliances, DAV filers, filers, etc.) are implemented to store contentobjects that can be accessed by various collaborators. In some cases, areplica of some of the content objects can be stored at an externalstorage facility (e.g., Amazon S3) for data loss protection incompliance with an associated service level agreement and/or datareplication policy.

The storage content objects 103 that are stored in the object filers canbe grouped into two categories: source objects 107 (e.g., originalcontent) and derivative objects 109 (e.g., derived content). Forexample, as can be observed in FIG. 1A, a set of collaborators cancreate and/or upload source objects to the object filers (operation 1).Derivative object set associated with the source objects are generatedusing various transformation functions (operation 2). In some cases, thederivative objects are generated based on certain exhibited and/orpredicted object access patterns by the collaborators (operation 3). Asan example, if a user (e.g., collaborator) uploads an image to an objectfiler at the cloud-based content management platform, the original imageand its encryption key are considered source content objects, while thethumbnails and previews of various resolutions are considered derivativecontent objects. One characteristic that distinguishes the sourceobjects from the derivative objects is that the source objects cannot beregenerated by computerized means, while a derivative object is anobject (e.g., file) that can be regenerated based on one or moreproperties of a respective source object or source object set.

As earlier mentioned, certain content objects at the object filer canbecome cold (e.g., infrequently accessed). In particular, theaforementioned use and purpose of the derivative objects is often suchthat many derivative objects become cold. Further, cold objects areoften stored using the expensive storage capacity of object filers thatcomprise expensive hardware and consume significant space, power, andmaintenance resources.

The herein disclosed techniques provide a technological solution to theforegoing problem by applying a rule base and a predictive model to thecontent objects to identify one or more content management actions thatserve to achieve quantitative objectives (e.g., reducing storage costs,etc.). Specifically, in the embodiment shown in FIG. 1A, an instance ofan object evictor applies a set of source object eviction rules to acorresponding set of source objects to identify one or more candidatesource objects for eviction or removal from a corresponding object filer(operation 4). The object evictor further applies a set of derivativeobject eviction rules to the derivative objects to identify one or morecandidate derivative objects for eviction or removal from the objectfiler (operation 5).

For example, the foregoing eviction rules might constrain the set ofcandidate eviction objects to those that can be removed from the objectfiler while remaining in compliance with a respective service levelagreement for the objects. An objective analysis approach is thenapplied to the eviction candidates to determine a set of evictionactions to implement to achieve one or more quantitative objectives(e.g., reduction in aggregate costs). For example, a storage costpredictive model can be consulted to determine an eviction order for thecandidate eviction objects 105 that most efficiently achieves a costreduction objective (operation 6). A set of object management commandsare then issued to the object filer to execute the eviction actions inthe eviction order (operation 7). As illustrated, an eviction mightcomprise a true deletion of the content object and/or a move of thecontent object to an external storage facility (e.g., lower cost storagefacility). The effects of the prescribed object eviction and/or anyeffects from execution of the object management commands serve toachieve the quantitative objective or objectives (e.g., to reducestorage capacity usage and costs) (operation 8).

Further details describing the herein disclosed techniques for dynamicmanagement of content objects stored on object filers are shown anddescribed as pertaining to FIG. 1B.

FIG. 1B depicts a content object processing technique 1B00 asimplemented in systems for dynamically managing cloud-based sharedcontent using predictive modeling. As an option, one or more variationsof content object processing technique 1B00 or any aspect thereof may beimplemented in the context of the architecture and functionality of theembodiments described herein. The content object processing technique1B00 or any aspect thereof may be implemented in any environment.

The content object processing technique 1B00 presents one embodiment ofcertain steps and/or operations for dynamically managing cloud-basedshared content using predictive modeling according to the hereindisclosed techniques. In one or more embodiments, the steps andunderlying operations comprising content object processing technique1B00 can be executed by an object eviction engine 106 shown asimplemented in one instance of a processing element that interacts withhigh performance storage (e.g., one or more filer appliances comprisinga given filer fleet). Certain other system components coupled to theobject eviction engine 106 are also shown for reference.

Specifically, content object processing technique 1B00 can commence bycollecting certain access records and performance measurementspertaining to a filer appliance (step 122). As shown, filer accessoperations 113 and performance data 115 can be codified and stored in aset of filer activity data 112. For example, the object eviction engine106 might record the requests for certain instances of source objectsand/or derivative objects from a set of active content 104. In somecases, such requests might be characterized as a “miss” when therequested object is not in the active content 104. Misses can be relatedto requests for objects removed from the active content 104 for variousreasons (e.g., eviction according to the herein disclosed techniques).In some cases, the miss might be related to a request for a derivativeobject that has not yet been generated. Misses will often invoke theexecution of commands to restore (e.g., from an external storagefacility) or regenerate (e.g., from a source object) the requestedcontent object. The filer performance measurements can comprisemeasurements of various filer performance metrics such as storageutilization, CPU utilization, access latency, and/or other metrics.

According to the embodiment shown in FIG. 1B, certain derivative objecteviction events are detected at the object eviction engine 106 (step124). For example, the eviction event might be invoked by a condition(e.g., eviction event condition) pertaining to a breach of a storageutilization threshold (e.g., high-water mark) at the filer appliance102. Responsive to detecting the eviction event condition, a set ofderivative object eviction rules are applied to identify one or morecandidate derivative objects for eviction from the filer (step 132). Forexample, the derivative object eviction rules might be formed by asubset of eviction rules 114, which are in turn accessed by the objecteviction engine 106 and applied to the derivative objects in the activecontent 104. Application of the derivative object eviction rules serveto identify a set of candidate derivative objects.

The candidate derivative objects are then ranked using a storage costpredictive model (step 134). As an example, a storage cost predictivemodel 116 might predict a cost savings and/or other quantitativecharacteristics of each of the candidate derivative objects tofacilitate the ranking. The candidate derivative objects can then beevicted from the active content store (e.g., active content 104) in anexecution order corresponding to the earlier determined ranking (step136). In certain implementations, the object eviction engine 106 canfurther monitor the filer activity data 112 to restore certain evictedcontent objects (e.g., removed content objects) based on access patterns(step 138). For example, the miss history at the filer activity data 112might invoke a restoration of a source object or a regeneration of aderivative object to the active content 104. A storage cost predictivemodel 116 can be populated and trained and validated using any knowntechniques. In one embodiment, a storage cost predictive model isconfigured to predict retention cost based on (1) object size, (2)measured frequency of access, and (3) predicted frequency of access.Furthermore, the storage cost predictive model can be configured topredict disposal cost based on (1) object size, (2) current frequency ofaccess, (3) cost to regenerate and/or reload the object, and (4)historical or forecasted access patterns. Strictly as examples, ahistorical access pattern can be formed by including (or excluding)characteristics of access events such as (a) ownership of the object,(b) organization to which the owner belongs, (c) department to which theowner belongs, etc. Furthermore, a forecasted access pattern might beformed based on an access pattern predictive model that has been trainedand validated using historical accesses. In still other situations,historical access patterns can be formed by including (or excluding)characteristics of interactions (e.g., collaboration interactions)between members of one organization and members of another organization.

Further details pertaining to implementing the herein disclosedtechniques in the cloud-based environment are described and shown aspertaining to FIG. 1C. Further details pertaining to implementing astorage cost predictive model in the cloud-based environment aredescribed and shown as pertaining to FIG. 2.

FIG. 1C depicts a cloud-based shared content management system 1C00including a collaborative cloud-based shared content management platformthat interacts with external storage facilities to dynamically managecloud-based shared content using predictive modeling. As an option, oneor more variations of cloud-based shared content management system 1C00or any aspect thereof may be implemented in the context of thearchitecture and functionality of the embodiments described herein. Thecloud-based shared content management system 1C00 or any aspect thereofmay be implemented in any environment. As used herein a cloud-basedshared content management system is a collection of storage devices thatare organized to share stored objects among users (e.g., people) oragents (e.g., computer programs) that access the shared stored objectsover a network using a web protocol such as http.

As shown in FIG. 1C, certain users (e.g., collaborators 153) havingvarious collaboration roles (e.g., creator, editor, administrator,approver, auditor, reviewer, etc.) can use one or more instances of userdevices 152 to interact with one or more workspaces (e.g., workspace 156₁, workspace 156 ₂, etc.) facilitated by a cloud-based shared contentstorage system 158. As an example, collaborator 153 ₁ might be a contentcreator (e.g., video producer) with access to workspace 156 ₁,collaborator 153 ₃ might be a video viewer with access to workspace 156₂, and collaborator 153 ₂ might be an administrator with access to bothworkspaces. The workspaces can be stored in any location, and are atleast partially maintained by components within the cloud-based sharedcontent storage system 158.

The cloud-based shared content storage system 158 supports any varietyof processing elements 160 such as the filer appliance 102 and/or anynumber of servers (e.g., server 101) comprising the object evictionengine 106 and/or other processing elements such as a content managementserver, a host server, a sync server, an application server, a clouddrive server, a content server, etc. The cloud-based shared contentstorage system 158 further facilitates access by the foregoingprocessing elements to any variety of storage devices 170 such as highperformance storage 172 comprising the active content 104 and/or thirdparty storage 174 comprising external storage 154. For example, the highperformance storage 172 might support data access exhibiting a loweraccess latency as compared to the third party storage 174, but the thirdparty storage 174 might exhibit a lower storage cost as compared to thehigh performance storage 172.

Any of the users can be provisioned authorized access to variousportions the content objects stored in the storage devices 170 withoutthe need for manually downloading and storing a file locally on aninstance of the user devices 152 (e.g., a desktop computer, a tablet, aWiFi phone, a workstation, a laptop computer, a smart phone, etc.). Forexample, one of the content objects (e.g., video file, computer file,text document, audio file, image file, etc.) created by collaborator 153₁ might be viewed by collaborator 153 ₃ without informing the othercollaborators (e.g., collaborator 153 ₂ collaborator 153 ₃) where thefile is physically stored in the storage devices 170.

User interaction with or through the aforementioned workspaces canfacilitate access to certain source objects and their derivative objectsby the collaborators. For example, a set of accessed objects 182 ₁ atworkspace 156 ₁ might comprise a source object (e.g., source object sA)and various derivative objects (e.g., derivative object d3, derivativeobject d4, derivative object d5, and derivative object d6) associatedwith the source object. Further, a set of accessed objects 182 ₂ atworkspace 156 ₂ might comprise source object sA and various derivativeobjects (e.g., derivative objects d3, derivative object d7, derivativeobject d8, and derivative object d9) associated with the source object.As can be observed, source object sA and at least one of the derivativeobjects (e.g., derivative object d3) are accessed at both workspaces. Asfurther shown, some of the derivatives objects (e.g., derivative objectd4 and derivative object d9) are identified as evicted (e.g., evictedobject 184 ₁ and evicted object 184 ₂, respectively). In accordance withthe herein disclosed techniques, such evicted objects might be removedfrom the active content 104 based on low access activity or no accessactivity to objects over a certain period of time (e.g., the objects arecold). In some cases, a source object or a derivative object might be“fortified” (e.g., copied to the external storage 154) before it isremoved from the active content 104. For example, such fortificationmight be performed for objects that are expensive to regenerate (e.g.,in terms of computing resources) or unfeasible to regenerate (e.g., fororiginal objects).

In certain embodiments, the object eviction engine 106 described hereincan facilitate the foregoing eviction operations and/or other operationspertaining to dynamically managing cloud-based shared content usingpredictive modeling. One embodiment of a system comprising the objecteviction engine 106 and/or other components for implementing the hereindisclosed techniques is shown and described as pertaining to FIG. 2.

FIG. 2 presents a block diagram of a computing environment 200 thatsupports various techniques as used in systems for dynamically managingcloud-based shared content using predictive modeling. As an option, oneor more variations of computing environment 200 or any aspect thereofmay be implemented in the context of the architecture and functionalityof the embodiments described herein.

The computing environment 200 shown in FIG. 2 is merely one example ofcomponents and data flows implemented in a cloud-based contentmanagement platform to support any of the herein disclosed techniques.As can be observed, computing environment 200 comprises a server (e.g.,server 101) that facilitates access by collaborators 153 to a set ofcontent (e.g., shared content) stored on storage devices 170.Representative instances of certain components and data flows associatedwith a representative one (e.g., server 101) of the multiple processingelements are further shown. Some of the presented components and/or dataflows might have respective instances implemented at other servers orfiler appliances, while some components and/or data flows might beshared by or partitioned across multiple servers and/or filerappliances.

As shown, a storage service 208 (e.g., a Scala service) within server101 implements a data access layer (e.g., application programminginterface (API)) to facilitate the aforementioned access to the contentat the storage devices 170, some of which storage devices may becomponents within a storage appliance. Specifically, the accessedcontent might comprise a set of active content 104 (e.g., “hotter”content) that includes a source object set 222 and derivative object set224. The storage service 208 further facilitates access to externalstorage 154. In some cases, the storage service 208 might containfunctionality to generate derivative objects from the source objects.

To illustrate the herein disclosed techniques, suppose that the activecontent 104 in the embodiment shown is stored in a high performancestorage facility (e.g., an grouping of high performance storageappliances) that stores the content objects determined to be or expectedto be most frequently accessed (e.g., hot or warm content objects).Further, consider that the external storage 154 represents a set oftiered low cost storage facilities (e.g., Amazon S3, Amazon Glacier,etc.) that stores at least some of a set of removed content objects 226determined to be or expected to be rarely accessed (e.g., colder contentobjects). In this scenario, the external storage 154 might store a setof fortified objects 228 from the removed content objects 226 that havebeen selected to be fortified at the external storage 154 and deletedfrom the active content 104.

To facilitate the foregoing management of the content objects across thevarious storage facilities available to the storage service 208according to the herein disclosed techniques, an instance of an objecteviction engine 106 is implemented within server 101. However, in othersituations, an object eviction engine 106 or portions thereof might beimplemented in a filer appliance. Various data stores and/or datastructures are further implemented to support the herein disclosedtechniques.

As shown in the embodiment of FIG. 2, a set of filer activity data 112,a set of eviction rules 114, a set of reload rules 218, and a candidateeviction object list 214 are cooperatively interconnected. Specifically,the object eviction engine 106 (e.g., implemented as a storageautomation framework (SAF) task) receives a set of object access records232 from the storage service 208 that describes various attributes(e.g., object identifier, timestamp, etc.) pertaining to object requestsfrom the collaborators 153 and/or other entities and/or processes. Theobject access records 232 can be used to generate a set of object accesspatterns 236 stored in the filer activity data 112. The object accessrecords 232 can further be used to generate an object miss history 238that describes certain attributes (e.g., object identifier, timestamp,etc.) pertaining to requests for objects that had been removed from theactive content 104. When a miss occurs, the requested object (e.g., aderivative object) can be regenerated by the storage service 208, or therequested object (e.g., a source object) can be retrieved from externalstorage 154 by the storage service 208.

In some embodiments, an object loader 216 within the object evictionengine 106 monitors the object access records 232 to build the objectmiss history 238. A set of filer performance measurements 234 thatdescribe measurements of various filer appliance performance metricsand/or storage device performance metrics at a given moment in time canfurther be received at the object eviction engine 106. Such filerappliance performance metrics might pertain to storage utilization, CPUutilization, access latency, and/or other performance characteristics ofthe filer appliance. As an example, certain performance measurements(e.g., filer appliance utilization, storage device utilization) and/orcombinations of measurements might characterize one or more evictionevent conditions that invoke (or suspend or halt) a set of correspondingoperations at the object eviction engine 106. As used herein a storagedevice is any device capable of holding data using tangible media.Examples are hard disk drives, solid state storage devices,battery-backed RAM, etc.

Some such operations invoked at the object eviction engine 106 mightinvolve an interaction with the candidate eviction object list 214.Specifically, the object eviction engine 106 might invoke certainprocesses to generate a list of eviction candidates from the activecontent 104. In some cases, generating the candidate eviction objectlist 214 can be performed by a background and/or offline process thatscans (e.g., weekly) the access logs for the active content 104 to builda list of objects that might be candidates for eviction. In some cases,historical access logs are not available. In such cases, monotonicallyincreasing object identifiers for the content objects might be used asan indication of the time of creation (e.g., “age”) of the objects.

The shown partitioning of object eviction engine 106 comprises an objecteviction rules engine 301, an object eviction ranking engine 401, anobject remover engine 501, and an object reloader engine 601. Further,the shown partitioning of object eviction engine 106 includes aninstance of a storage cost predictive model 116, which is accessible byany and all of the foregoing processing engines.

Various operations performed by the foregoing engines, or otherwiseinvoked at the object eviction engine 106, might in turn invoke anapplication of a set of eviction rules 114 (e.g., source object evictionrules 242 and derivative object eviction rules 244) to the candidateeviction object list 214 to determine a set of eviction actions 254 forthe eviction candidates. In certain embodiments, the eviction actions254 comprise a respective set of object management commands 256 that areissued to the storage service 208 to carry out the eviction action for aparticular content object. In some cases, a storage cost predictivemodel 116 is consulted (e.g., by the object eviction ranking engine 401)to determine an execution order 252 for the candidate eviction objectlist 214. For example, the storage cost predictive model 116 mightproduce a set of predicted retention costs 246 that pertain to retainingthe candidate eviction objects in a filer appliance, and a set ofpredicted removal costs 248 that pertain to removing the candidateeviction objects from the filer appliance.

The foregoing predicted costs can be used to assign a cost reductionscore to the candidate eviction objects 105, which score is used todetermine the execution order 252. When certain eviction eventconditions are detected at the object eviction engine 106, therespective object management commands of some or all of the objects onthe candidate eviction object list 214 are executed in the executionorder 252. In some cases, certain instances of the removed contentobjects 226 are restored (e.g., reloaded, regenerated, etc.) to theactive content 104 based on the filer activity data 112. As an example,a particular object might be restored when the object miss history 238indicates the object has transitioned from a cold state to a warm state.Specifically, the object reloader engine 601 can apply the reload rules218 to the object miss history 238 to determine the object managementcommands to issue to the storage service 208 so as to restore certaincontent objects.

The computing environment 200 shown in FIG. 2 presents merely onepartitioning. The specific example shown is purely exemplary, and othersubsystems and/or partitioning is reasonable.

FIG. 3 illustrates a candidate eviction object selection technique 300as implemented in systems for dynamically managing cloud-based sharedcontent using predictive modeling. As an option, one or more variationsof candidate eviction object selection technique 300 or any aspectthereof may be implemented in the context of the architecture andfunctionality of the embodiments described herein. The candidateeviction object selection technique 300 or any aspect thereof may beimplemented in any environment.

FIG. 3 depicts one embodiment of various steps and/or operationsimplemented within an object eviction rules engine 301. The operationsserve to select eviction object candidates. The eviction rules arecodified in specialized data structures that are designed to improve theway a computer stores and retrieves data in memory when performing suchsteps and/or operations. As illustrated, the shown steps and/oroperations can be performed by an instance or component of an objecteviction engine (e.g., object eviction engine 106) as described herein.

The candidate eviction object selection technique 300 can commence atstep 302 by gathering statistics and/or usage patterns pertaining toaccesses to the active content. Then, based on the gathered information,creating an initial candidate list of candidate eviction objects fromthe active content of a given filer appliance (step 303). As an example,snapshots (e.g., weekly Hive-hosted snapshots) of the file and versioninformation associated with the active content 104 might be combinedwith corresponding access logs (e.g., download logs from a downloadproxy) to produce the candidate eviction object list 214. As shown, thecandidate eviction object list 214 might comprise various attributes(e.g., candidate list attributes 322) that describe the candidateeviction objects on the list. The data comprising the candidate evictionobject list 214 are often organized and/or stored in a tabular structure(e.g., Hadoop HBase table) having rows corresponding to a particularcandidate object and columns corresponding to various attributespertaining to that candidate object. For example, as depicted in thecandidate list attributes 322, a table row might describe an objectidentifier or “objID”, a storage service identifier or “ssID” (e.g.,used to identify an associated filer appliance), a source objectidentifier or “source ID” associated with the candidate object (e.g.,source ID=null for source objects), an object “size” (e.g., in MB), alist of “actions”, an execution order position or “priority”, acandidate list schema version, and/or other attributes. In someembodiments, the key to each item (e.g., row) in the candidate evictionobject list 214 comprises a combination of a hashed version of the“ssID”, the “priority” attribute, and the “objID” attribute. The hashed“ssID” and the “priority” can have fixed widths to facilitatepriority-ordered list scan operations on a per-“ssID” basis.

The candidate eviction object selection technique 300 further involvesinteraction (e.g., by the object eviction engine 106) with variouseviction rules (e.g., eviction rules 114) corresponding the candidateeviction objects (step 304). The eviction rules 114 are applied (e.g.,object-by-object) to the candidate eviction object list 214 to assignone or more eviction actions to each candidate eviction object (step306). A rule base such as eviction rules 114 comprises data recordsstoring various information that can be used to form one or moreconstraints to apply to certain functions and/or operations. Forexample, an eviction rule might specify a set of object attributes that,if matched by the attributes describing a certain candidate object,select a corresponding set of eviction actions to assign to thatparticular candidate object. For example, a derivative object that ischaracterized as non-deterministic (e.g., multiple derivative generationexecutions may produce varying results) might be assigned an evictionaction that comprises a fortify operation followed by an evictoperation, while a derivative object that is characterized asdeterministic (e.g., multiple derivative generation executions producethe same result) might be assigned an eviction action that comprisesmerely an evict operation. In some cases, the information pertaining toan eviction rule might comprise various input parameters that areconsumed by corresponding conditional logic statements that are executedto determine one or more results (e.g., eviction actions).

For example, data storage location constraints pertaining to a servicelevel agreement associated with a given content object might be appliedto a selected conditional logic statement (e.g., if-then statement, casestatement, etc.) to determine whether the content object is to beretained in active storage, fortified and evicted, or merely evicted. Insuch cases, the eviction rules 114 might be organized and/or stored asprogramming code objects that receive object-specific input parameterssuch as rule input parameters 324, that are used to determine certainresults (e.g., eviction actions) corresponding to each candidateeviction object. The data comprising the rule input parameters 324 areoften organized and/or stored in a tabular structure (e.g., relationaldatabase table) having rows corresponding to a particular candidateobject and columns corresponding to various rule input parameterspertaining to that candidate object. For example, as depicted in therule input parameters 324, a table row might describe an objectidentifier or “objID”, a service level agreement (SLA) identifier or“slaID”, a candidate object “type” (e.g., deterministic,non-deterministic, original, etc.), an expected time of next use for theobject or “nextUse”, and/or other parameters.

Other possible examples of eviction actions determined by applying theeviction rules 114 to the candidate eviction object list 214 follow. Inone example, some original content (e.g., source content objects) mightbe fortified prior to being evicted so as to maintain a two-replica SLAconstraint. In another example, for derivative objects (e.g., imagethumbnail) having a one-to-one relationship with a source object (e.g.,an image), the derivative object can merely be evicted since thederivative object can be regenerated if requested at a later moment intime. As yet another example, for derivative objects (e.g., documentpreview) having a one-to-many relationship with multiple assets (e.g.,the pages of the preview), the assets associated with the derivativeobject are identified and assigned a selected eviction action (e.g., allpages in the preview are evicted).

In still other situations, some derivative objects might also exhibit atime of expected reuse or next use as indicated by the “nextUse”attribute. For example, medical records might be accessed near the timeof an upcoming appointment, or tax records might be accessed morefrequently during tax season. Such expected reuse time might factor intodecisions pertaining to eviction actions. As an option, the “nextUse”indicator might be used to establish a set of time-based evictionactions that facilitate a deletion of an object for a certain period oftime (e.g., to avoid storage costs) and regeneration of the object priorto its expected use.

When the eviction rules 114 are applied to the candidate eviction objectlist 214 and the eviction actions for the candidate objects in the listare determined, the candidate eviction object list 214 is updated withthe object management commands to carry out the determined evictionactions (step 308). Specifically, for example, the “action” attribute ofa particular candidate object is updated to describe the commands fromthe object management commands 256 selected to carry out the evictionaction or actions for the candidate object. More specifically, incertain embodiments, the “/evict” service call from the objectmanagement commands 256 requests that the storage service evict thesubject object (e.g., designated by the “objID”) from the active content104.

In some cases, the storage service might override the “/evict” command.For example, the then-current data retention policy (e.g., from the SLA)might forbid any type of eviction of the object. As another example,there is an insufficient number of replicas (e.g., according to a datareplication policy) to evict the object. In this case, the “/fortify”command might be included in the eviction “actions” and executed priorto executing an “/evict” command. In certain embodiments, the “/fortify”call instructs the storage service to invoke an asynchronous job toperform the data transmission (e.g., object copy) required to achievefortification. If the optional “&evict=true” service call parameter isprovided, an “/evict” call is automatically triggered upon successfulcompletion of the “/fortify” operation. The “/load” call is issued, forexample, by an object loader to restore a subject object to the activecontent 104. Further details pertaining to restoring removed contentobjects is shown and described as pertaining to FIG. 6.

In some situations, still further variations of object managementcommands 256 are provided. Such additional object management commandsmight include a “/get” command (e.g., for requesting a subject object),a “/put” command (e.g., to put a source object to a shared storagedevice), a “/tag” command (e.g., to add tagging metadata to acorresponding object), a “/hint” command (e.g., to add classification orother information to metadata corresponding intended restrictions orbehaviors over the object), and a “/delete” command (e.g., to delete asubject object from a shared storage device).

Strictly as examples, a “/tag” might refer to formally specified datafeatures or characteristics such as, “PERMANENT” (e.g., to suggest thatno eviction should take place on the file), or “TEMPORARY” (e.g., usedto explicitly identify short-lived files that can be quickly deleted ormoved to lower-cost storage). Other tags are possible. Tags can be usedto explicitly override default behaviors.

The execution of a “/hint” command might add still additionalinformation that might influence behavior. For example, metadataprovided by a “/hint” command might inform the system if an uploadedobject is to be regarded as an intermediate object or as a final productobject, or it may provide an indication that an object is a highlyconnected object (e.g., in association with other objects in aprovenance relationship). In some cases a “/hint” (or other command) mayindicate that derivative objects are to be regenerated immediately upona change to a respective source object.

Performing such management commands on objects can influence aspects offuture management commands. A series of management commands can bestrung together or otherwise associated to form patterns. Strictly asone example, if a recurring pattern of /load followed by /evict,followed by /load is detected, that pattern might be indicative of anobject where the predicted retention cost is equal or nearly equal tothe predicted removal cost. Such a scenario might result in thrashing.Accordingly, detection of the recurring pattern might cause a bias valueor hysteresis value to be stored in metadata pertaining to the thrashingobject, so as to avoid continued thrashing.

At step 310, once a candidate list and associated items have beendetermined (e.g., according to the candidate eviction object selectiontechnique 300), various techniques can be used to determine the“priority” or execution order of the eviction candidates. One suchprioritization or ranking technique is shown and described as pertainingto FIG. 4.

FIG. 4 presents a candidate eviction object ranking technique 400 asimplemented in systems for dynamically managing cloud-based sharedcontent using predictive modeling. As an option, one or more variationsof candidate eviction object ranking technique 400 or any aspect thereofmay be implemented in the context of the architecture and functionalityof the embodiments described herein. The candidate eviction objectranking technique 400 or any aspect thereof may be implemented in anyenvironment.

The candidate eviction object ranking technique 400 presents oneembodiment of an object eviction ranking engine 401 that implementscertain steps and/or operations for ranking and/or prioritizing foreviction one or more candidate eviction objects according to the hereindisclosed techniques. In one or more embodiments, the steps andunderlying operations comprising candidate eviction object rankingtechnique 400 can be executed by an object eviction ranking engine 401,or by any variations of the object eviction engine 106 described herein.

The shown candidate eviction object ranking technique 400 can commenceby identifying the candidate eviction objects to analyze (step 402). Forexample, the identified candidate eviction objects might comprise thecandidate eviction object list 214. In certain embodiments, analysis ofcandidate eviction objects comprises use of a storage cost predictivemodel (e.g., storage cost predictive model 116) to determine predictedretention costs and predicted removal costs for the candidate evictionobjects (step 404). Learning models such as the storage cost predictivemodel 116 are a collection of mathematical techniques (e.g., algorithms)that facilitate determining (e.g., predicting) a set of outputs (e.g.,outcomes, responses) based on a set of inputs (e.g., stimuli). A storagecost predictive model can include data that includes an indicatorcorresponding to whether the stimuli pertains to a source object, or thestimuli pertains to a derivative object. Moreover, given an indicationcorresponding to whether the stimuli pertains to a source object, or thestimuli pertains to a derivative object, a source object can besubjected to an analysis using a first set of criteria (e.g., costspertaining to the source object), and a derivative object can besubjected to an analysis using a second set of criteria (e.g., costspertaining to the derivative object).

For example, a storage cost predictive model 116 might consume a set ofobject attributes (e.g., source object indication, derivative objectindication, size, number of related objects, predicted access patterns,etc.) as inputs to predict a set of storage costs and removal costs asoutputs. For example, such costs might pertain to ongoing storagefacility costs, but also pertain to computing costs and networking costsfor restoring (e.g., retrieving, regenerating, uploading, etc.) removedcontent objects, quantitative “cost” representations of the userexperience (e.g., related to request delays, etc.), and/or other costs.In some cases, the techniques implemented by the model might comprise aset of equations having coefficients that relate one or more of theinput variables to one or more of the output variables. In these cases,the equations and coefficients can be determined by a training process.

To facilitate comparison of the candidate eviction objects, the objectsare plotted in an objective space defined by a set of quantitativeobjectives such as the foregoing predicted cost metrics (step 406). Forexample, a set of evaluated candidate eviction objects 414 can beplotted in a two-dimensional objective space defined by a predictedretention cost objective and a predicted removal cost objective. Anynumber of other objectives is possible. An objective function relatingthe objectives (e.g., predicted retention cost and predicted removalcost) in the objective space can be used to determine cost reductionscores for the plotted candidate eviction objects (step 408).

As an example, the objective function comprising quantitative objectives416 might have characteristics that identify the points in the objectivespace. Other characteristics (e.g., slopes, polynomial orders, etc.)pertaining to the objective function can define different objectivespaces. In some cases, and as shown, the plotted candidate evictionobjects to the right of and below the objective function comprisingquantitative objectives 416 are identified as eviction candidates 424(e.g., the predicted cost in a first dimension is less than or equal tothe predicted cost in a second dimension) while the remaining plottedcandidate content objects are identified as disqualified candidates 422(e.g., the predicted cost in the first dimension is greater than thepredicted cost in the second dimension).

In the specific case shown, the plotted candidate eviction objects tothe right of and below the objective function comprising quantitativeobjectives 416 are identified as eviction candidates 424 since thepredicted cost in the first dimension (e.g., predicted removal cost) isless than or equal to the predicted cost in the second dimension (e.g.,the predicted retention cost). The scores can be used with or withoutother information (e.g., a bias value and/or a hysteresis value) toassign an eviction priority 418 to the candidate eviction objects (step410).

An objective function can be formed using any number of terms (e.g.,variables and/or constants). As an example, one objective function mightconsider the (lowered) cost of fortifying an object to external storage,at the same time that the cost of access/retrieval from the externalstorage is considered. As a specific case, although it might be 50% lessexpensive to fortify an object to cold storage than it would be toretain the object in a filer, it might be twice as costly (e.g., due toaccess fees charged by external storage vendor) to retrieve it from thecold storage. As such the objective function might include likelihood offuture access in the objective function calculation.

As illustrated, some representation (e.g., numbered sequence) of theeviction priority 418 can be stored in the “priority” attribute from thecandidate list attributes 322 describing the candidate eviction objectlist 214. In this case, the “priority” attribute of the disqualifiedcandidates 422 might remain “null” to distinguish the disqualifiedcandidates 422 from the eviction candidates 424 prioritized forexecution. The eviction priority 418 can be used to determine aneviction execution order invoked by certain eviction event conditions asshown and described as pertaining to FIG. 5.

FIG. 5 depicts a content object removal technique 500 as implemented insystems for dynamically managing cloud-based shared content usingpredictive modeling. As an option, one or more variations of contentobject removal technique 500 or any aspect thereof may be implemented inthe context of the architecture and functionality of the embodimentsdescribed herein. The content object removal technique 500 or any aspectthereof may be implemented in any environment.

The content object removal technique 500 presents one embodiment ofcertain steps and/or operations for removing (e.g., evicting) one ormore candidate eviction objects according to the herein disclosedtechniques. In one or more embodiments, the steps and underlyingoperations comprising content object removal technique 500 can beexecuted by the shown object remover engine 501, or by an instance orcomponent of an object eviction engine 106 described herein.

The content object removal technique 500 can commence by detecting oneor more eviction event conditions (step 502). For example, the objecteviction engine 106 might monitor the storage capacity used at a givenfiler appliance. When the object eviction engine 106 detects thecapacity utilization has breached some threshold (e.g., high-watermark), the object eviction engine 106 can invoke various evictionprocesses. In some embodiments, the thresholds can be specific for agiven filer appliance to manage the eviction activities and/or policieson a per-filer basis. Certain of the aforementioned detected evictionevent conditions might invoke the identification of any candidateeviction objects having actions that are prioritized for execution (step504). From these candidate eviction objects the highest priorityeviction candidate can be selected (step 506).

The object management command or commands associated with the selectedeviction candidate are then issued (e.g., to the storage service) tocarry out the earlier determined eviction actions for the selectedobject (step 508). If more eviction candidates are available (see “Yes”path of decision 510) and the eviction event conditions (e.g., breachedfiler utilization threshold) remain (see “No” path of decision 512) suchthat the execution of the invoked eviction process should continue, thenthe next highest priority eviction candidate is selected (step 514). Theobject management command or commands associated with the selectedeviction candidate are issued (step 508) and the process continues untileither (1) no eviction candidates remain (see “No” path of decision 510)or until (2) the eviction event conditions change (see “Yes” path ofdecision 512) such that the execution of the invoked eviction processshould be suspended or halted (step 516). As one specific example, ofsuspension, if (e.g., as a result of a series of evictions) the filerutilization threshold drops below a particular utilization threshold,then the object remover engine might suspend itself until conditionschange, causing another iteration within the object remover engine(e.g., iterating from an event detection at step 502).

In some cases, the rate (e.g., eviction rate 522) at which the evictionactions of the selected eviction candidates are executed is regulated.The eviction rate 522 (e.g., evictions per second) can be defined on aper-filer basis to govern the load on the filer appliance and/or theassociated networking and storage resources related to the evictionoperations. In other cases, one or more of the removed content objectsresulting from any of the foregoing eviction operations are restored.One embodiment of a technique for restoring removed content objects isshown and described as pertaining to FIG. 6.

FIG. 6 depicts a content object restoration technique 600 as implementedin systems for dynamically managing cloud-based shared content usingpredictive modeling. As an option, one or more variations of contentobject restoration technique 600 or any aspect thereof may beimplemented in the context of the architecture and functionality of theembodiments described herein. The content object restoration technique600 or any aspect thereof may be implemented in any environment.

FIG. 6 depicts one embodiment of various steps and/or operations torestore removed content objects previously evicted from active contentaccording to the herein disclosed techniques. As illustrated, the shownsteps and/or operations can be performed by an instance of an objectreloader engine 601, or by any operations or components of the objecteviction engine 106 described herein.

The content object restoration technique 600 can commence by receiving aset of object access records (step 602) that can be used to build anobject miss history (step 604). For example, the object access records232 can be used to store or generate an object miss history.Implementations or codifications of an object miss history mightcomprise various attributes (e.g., object miss history attributes 638)that describe the content object request misses. The data comprising theobject miss history are often organized and/or stored in a tabularstructure (e.g., Hadoop HBase table) having rows corresponding to aparticular missed object and columns corresponding to various attributespertaining to that missed object. For example, as depicted in the objectmiss history attributes 638, a table row might describe an objectidentifier or “objID”, a list of timestamps pertaining to misses for theobject or “missTimes”, a description of the type of miss or “type”(e.g., due to eviction, due to unavailability of filer, due to object inthe process of being restored, etc.), a timestamp pertaining to the lastreload of the object or “reloadTime”, and/or other attributes.

A set of reload rules (e.g., reload rules 218) are applied to the objectmiss history 238 to identify removed content objects for reload (step606). As an example, the reload rules might indicate that any removedcontent object for which N misses have occurred over a time period T(e.g., as determined from the object miss history 238) is to bereloaded. In this case, the variables N and/or T are tunable on afiler-by-filer basis and/or on an ownership basis, and/or on an SLAbasis, and/or on an object-by-object basis. In some cases, thethen-current predicted costs (e.g., from a storage cost predictivemodel) for restoring the subject removed content object can also be usedto determine whether to restore the subject object. When a determinationhas been made to restore a removed content object, the object managementcommands to reload the subject object are issued (step 608). Forexample, a “/load” command for the subject object might be issued by theobject loader 216 (see FIG. 2) to a storage service at the filerappliance. In some cases, the “/load” command might invoke a move and/ora copy of the subject object (e.g., source object) from external storageto active content. In other cases, the “/load” command might invoke aregeneration of the subject object (e.g., derivative object) by thestorage service.

Additional Embodiments of the Disclosure Additional PracticalApplication Examples

FIG. 7 depicts a system 700 as an arrangement of computing modules thatare interconnected so as to operate cooperatively to implement certainof the herein-disclosed embodiments. This and other embodiments presentparticular arrangements of elements that, individually and/or ascombined, serve to form improved technological processes that addressminimizing highly-available storage requirements in the presence of ahighly dynamic corpus of shared content objects in a cloud-based contentmanagement environment. As an option, the system 700 may be implementedin the context of the architecture and functionality of the embodimentsdescribed herein. Of course, however, the system 700 or any operationtherein may be carried out in any desired environment. The system 700comprises at least one processor and at least one memory, the memoryserving to store program instructions corresponding to the operations ofthe system. As shown, an operation can be implemented in whole or inpart using program instructions accessible by a module. The modules areconnected to a communication path 705, and any operation can communicatewith other operations over communication path 705. The modules of thesystem can, individually or in combination, perform method operationswithin system 700. Any operations performed within system 700 may beperformed in any order unless as may be specified in the claims. Theshown embodiment implements a portion of a computer system, presented assystem 700, comprising a computer processor to execute a set of programcode instructions (module 710) and modules for accessing memory to holdprogram code instructions to perform: identifying one or more candidateeviction objects from the content objects, where at least one of thecandidate eviction objects is a derivative content object derived from arespective one or more of the content objects (module 720); determiningone or more object management commands associated with the candidateeviction objects (module 730); detecting at least one eviction eventcondition associated with at least one of the filer appliances (module740); and initiating, responsive to detecting the eviction eventcondition, at least one of the object management commands (module 750).

Variations of the foregoing may include more or fewer of the shownmodules. Certain variations may perform more or fewer (or different)steps, and/or certain variations may use data elements in more, or infewer (or different) operations.

System Architecture Overview Additional System Architecture Examples

FIG. 8A depicts a block diagram of an instance of a computer system 8A00suitable for implementing embodiments of the present disclosure.Computer system 8A00 includes a bus 806 or other communication mechanismfor communicating information. The bus interconnects subsystems anddevices such as a central processing unit (CPU), or a multi-core CPU(e.g., data processor 807), a system memory (e.g., main memory 808, oran area of random access memory (RAM)), a non-volatile storage device ornon-volatile storage area (e.g., read-only memory 809), an internalstorage device 810 or external storage device 813 (e.g., magnetic oroptical), a data interface 833, a communications interface 814 (e.g.,PHY, MAC, Ethernet interface, modem, etc.). The aforementionedcomponents are shown within processing element partition 801, howeverother partitions are possible. The shown computer system 8A00 furthercomprises a display 811 (e.g., CRT or LCD), various input devices 812(e.g., keyboard, cursor control), and an external data repository 831.

According to an embodiment of the disclosure, computer system 8A00performs specific operations by data processor 807 executing one or moresequences of one or more program code instructions contained in amemory. Such instructions (e.g., program instructions 802 ₁, programinstructions 802 ₂, program instructions 802 ₃, etc.) can be containedin or can be read into a storage location or memory from any computerreadable/usable storage medium such as a static storage device or a diskdrive. The sequences can be organized to be accessed by one or moreprocessing entities configured to execute a single process or configuredto execute multiple concurrent processes to perform work. A processingentity can be hardware-based (e.g., involving one or more cores) orsoftware-based, and/or can be formed using a combination of hardware andsoftware that implements logic, and/or can carry out computations and/orprocessing steps using one or more processes and/or one or more tasksand/or one or more threads or any combination thereof.

According to an embodiment of the disclosure, computer system 8A00performs specific networking operations using one or more instances ofcommunications interface 814. Instances of the communications interface814 may comprise one or more networking ports that are configurable(e.g., pertaining to speed, protocol, physical layer characteristics,media access characteristics, etc.) and any particular instance of thecommunications interface 814 or port thereto can be configureddifferently from any other particular instance. Portions of acommunication protocol can be carried out in whole or in part by anyinstance of the communications interface 814, and data (e.g., packets,data structures, bit fields, etc.) can be positioned in storagelocations within communications interface 814, or within system memory,and such data can be accessed (e.g., using random access addressing, orusing direct memory access DMA, etc.) by devices such as data processor807.

The communications link 815 can be configured to transmit (e.g., send,receive, signal, etc.) any types of communication packets (e.g.,communication packet 838 ₁, . . . , communication packet 838 _(N))comprising any organization of data items. The data items can comprise apayload data area 837, a destination address 836 (e.g., a destination IPaddress), a source address 835 (e.g., a source IP address), and caninclude various encodings or formatting of bit fields to populate theshown packet characteristics 834. In some cases, the packetcharacteristics include a version identifier, a packet or payloadlength, a traffic class, a flow label, etc. In some cases, the payloaddata area 837 comprises a data structure that is encoded and/orformatted to fit into byte or word boundaries of the packet.

In some embodiments, hard-wired circuitry may be used in place of or incombination with software instructions to implement aspects of thedisclosure. Thus, embodiments of the disclosure are not limited to anyspecific combination of hardware circuitry and/or software. Inembodiments, the term “logic” shall mean any combination of software orhardware that is used to implement all or part of the disclosure.

The term “computer readable medium” or “computer usable medium” as usedherein refers to any medium that participates in providing instructionsto data processor 807 for execution. Such a medium may take many formsincluding, but not limited to, non-volatile media and volatile media.Non-volatile media includes, for example, optical or magnetic disks suchas disk drives or tape drives. Volatile media includes dynamic memorysuch as a random access memory.

Common forms of computer readable media include, for example, floppydisk, flexible disk, hard disk, magnetic tape, or any other magneticmedium; CD-ROM or any other optical medium; punch cards, paper tape, orany other physical medium with patterns of holes; RAM, PROM, EPROM,FLASH-EPROM, or any other memory chip or cartridge, or any othernon-transitory computer readable medium. Such data can be stored, forexample, in any form of external data repository 831, which in turn canbe formatted into any one or more storage areas, and which can compriseparameterized storage 839 accessible by a key (e.g., filename, tablename, block address, offset address, etc.).

Execution of the sequences of instructions to practice certainembodiments of the disclosure are performed by a single instance of thecomputer system 8A00. According to certain embodiments of thedisclosure, two or more instances of computer system 8A00 coupled by acommunications link 815 (e.g., LAN, PTSN, or wireless network) mayperform the sequence of instructions required to practice embodiments ofthe disclosure using two or more instances of components of computersystem 8A00.

The computer system 8A00 may transmit and receive messages such as dataand/or instructions organized into a data structure (e.g.,communications packets). The data structure can include programinstructions (e.g., application code 803), communicated throughcommunications link 815 and communications interface 814. Receivedprogram code may be executed by data processor 807 as it is receivedand/or stored in the shown storage device or in or upon any othernon-volatile storage for later execution. Computer system 8A00 maycommunicate through a data interface 833 to a database 832 on anexternal data repository 831. Data items in a database can be accessedusing a primary key (e.g., a relational database primary key).

The processing element partition 801 is merely one sample partition.Other partitions can include multiple data processors, and/or multiplecommunications interfaces, and/or multiple storage devices, etc. withina partition. For example, a partition can bound a multi-core processor(e.g., possibly including embedded or co-located memory), or a partitioncan bound a computing cluster having plurality of computing elements,any of which computing elements are connected directly or indirectly toa communications link. A first partition can be configured tocommunicate to a second partition. A particular first partition andparticular second partition can be congruent (e.g., in a processingelement array) or can be different (e.g., comprising disjoint sets ofcomponents).

A module as used herein can be implemented using any mix of any portionsof the system memory and any extent of hard-wired circuitry includinghard-wired circuitry embodied as a data processor 807. Some embodimentsinclude one or more special-purpose hardware components (e.g., powercontrol, logic, sensors, transducers, etc.). Some embodiments of amodule include instructions that are stored in a memory for execution soas to implement algorithms that facilitate operational and/orperformance characteristics pertaining to dynamically managingcloud-based shared content using predictive modeling. A module mayinclude one or more state machines and/or combinational logic used toimplement or facilitate the operational and/or performancecharacteristics pertaining to dynamically managing cloud-based sharedcontent using predictive modeling.

Various implementations of the database 832 comprise storage mediaorganized to hold a series of records or files such that individualrecords or files are accessed using a name or key (e.g., a primary keyor a combination of keys and/or query clauses). Such files or recordscan be organized into one or more data structures (e.g., data structuresused to implement or facilitate aspects of dynamically managingcloud-based shared content using predictive modeling). Such files,records, or data structures can be brought into and/or stored involatile or non-volatile memory. More specifically, the occurrence andorganization of the foregoing files, records, and data structuresimprove the way that the computer stores and retrieves data in memory,for example, to improve the way data is accessed when the computer isperforming operations pertaining to dynamically managing cloud-basedshared content using predictive modeling, and/or for improving the waydata is manipulated when performing computerized operations pertainingto applying a rule base and a predictive model to a set of contentobjects to identify one or more content management actions that serve toachieve quantitative objectives.

FIG. 8B depicts a block diagram of an instance of a cloud-basedenvironment 8B00. Such a cloud-based environment supports access toworkspaces through the execution of workspace access code (e.g.,workspace access code 842 ₀, workspace access code 842 ₁, and workspaceaccess code 842 ₂). Workspace access code can be executed on any of theshown access devices 852 (e.g., laptop device 852 ₄, workstation device852 ₅, IP phone device 852 ₃, tablet device 852 ₂, smart phone device852 ₁, etc.). A group of users can form a collaborator group 858, and acollaborator group can be composed of any types or roles of users. Forexample, and as shown, a collaborator group can comprise a usercollaborator, an administrator collaborator, a creator collaborator,etc. Any user can use any one or more of the access devices, and suchaccess devices can be operated concurrently to provide multipleconcurrent sessions and/or other techniques to access workspaces throughthe workspace access code.

A portion of workspace access code can reside in and be executed on anyaccess device. Any portion of the workspace access code can reside inand be executed on any computing platform 851, including in a middlewaresetting. As shown, a portion of the workspace access code resides in andcan be executed on one or more processing elements (e.g., processingelement 805 ₁). The workspace access code can interface with storagedevices such as the shown networked storage 855. Storage of workspacesand/or any constituent files or objects, and/or any other code orscripts or data can be stored in any one or more storage partitions(e.g., storage partition 804 ₁). In some environments, a processingelement includes forms of storage such as RAM and/or ROM and/or FLASH,and/or other forms of volatile and non-volatile storage.

A stored workspace can be populated via an upload (e.g., an upload froman access device to a processing element over an upload network path857). A stored workspace can be delivered to a particular user and/orshared with other particular users via a download (e.g., a download froma processing element to an access device over a download network path859).

In the foregoing specification, the disclosure has been described withreference to specific embodiments thereof. It will however be evidentthat various modifications and changes may be made thereto withoutdeparting from the broader spirit and scope of the disclosure. Forexample, the above-described process flows are described with referenceto a particular ordering of process actions. However, the ordering ofmany of the described process actions may be changed without affectingthe scope or operation of the disclosure. The specification and drawingsare to be regarded in an illustrative sense rather than in a restrictivesense.

What is claimed is:
 1. A method for managing one or more storage contentobjects stored in one or more storage devices in a cloud-based sharedcontent management system, the method comprising: identifying a sourceobject of the storage content objects; identifying at least onederivative object that has been generated based on one or moreproperties of the source object; determining a set of one or morecandidate eviction objects from the storage content objects wherein atleast one of the candidate eviction objects is the at least onederivative object; analyzing the source object based at least in part ona first set of criteria pertaining to the source object; analyzing theat least one derivative object based at least in part on a second set ofcriteria pertaining to the at least one derivative object; determining,based at least in part on the analysis of the derivative object, one ormore object management commands associated with the derivative object;and initiating execution of at least one of the object managementcommands.
 2. The method of claim 1, wherein the second set of criteriapertaining to the at least one derivative object is a cost to regeneratethe derivative object from the source object.
 3. The method of claim 1,wherein the or more object management commands comprises at least oneof, a command to delete, a command to relocate, or a command tocompress, or any combination thereof.
 4. The method of claim 1, furthercomprising determining, based at least in part on the analyzing, one ormore object management commands associated with the candidate evictionobjects.
 5. The method of claim 1, wherein at least one of, determiningthe set of one or more candidate eviction objects, or determining theobject management commands, comprises applying one or more evictionrules to the storage content objects.
 6. The method of claim 1, whereinat least one of, an object management command, or an execution order oftwo or more of the object management commands, is determined from one ormore quantitative objectives.
 7. The method of claim 6, wherein thequantitative objectives are derived from at least one of, a predictedretention cost, or a predicted removal cost.
 8. The method of claim 7,wherein at least one of, the predicted retention cost, or the predictedremoval cost, is determined by a storage cost predictive model.
 9. Themethod of claim 1, further comprising recording a set of filer activitydata that characterizes one or more filer access operations at thestorage devices.
 10. The method of claim 9, wherein at least one of,identifying the candidate eviction objects, or determining the objectmanagement commands, is based at least in part on the filer activitydata.
 11. The method of claim 1, further comprising restoring, to atleast one of the storage devices, one or more removed storage contentobjects.
 12. The method of claim 11, wherein restoring the removedstorage content objects is based at least in part on a set of fileractivity data.
 13. The method of claim 12, wherein restoring the removedstorage content objects comprises regenerating a previously removedderivative object.
 14. The method of claim 1, further comprisingdelaying the execution of at least one of the object management commandsuntil at least one eviction event condition is detected.
 15. The methodof claim 14, wherein the eviction event condition is based at least inpart on one or more filer performance measurements.
 16. The method ofclaim 14, further comprising suspending execution of the at least one ofthe object management commands responsive to detecting a change to theeviction event condition.
 17. A computer readable medium, embodied in anon-transitory computer readable medium, the non-transitory computerreadable medium having stored thereon a sequence of instructions which,when stored in memory and executed by one or more processors causes theone or more processors to perform a set of acts for managing one or morestorage content objects stored in one or more storage devices in acloud-based shared content management system, the acts comprising:identifying a source object of the storage content objects; identifyingat least one derivative object that has been generated based on one ormore properties of the source object; determining a set of one or morecandidate eviction objects from the storage content objects wherein atleast one of the candidate eviction objects is the at least onederivative object; analyzing the source object based at least in part ona first set of criteria pertaining to the source object; analyzing theat least one derivative object based at least in part on a second set ofcriteria pertaining to the at least one derivative object; determining,based at least in part on the analysis of the derivative object, one ormore object management commands associated with the derivative object;and initiating execution of at least one of the object managementcommands.
 18. The computer readable medium of claim 17, wherein thesecond set of criteria pertaining to the at least one derivative objectis a cost to regenerate the derivative object from the source object.19. A system for managing one or more storage content objects stored inone or more storage devices in a cloud-based shared content managementsystem, the system comprising: a storage medium having stored thereon asequence of instructions; and one or more processors that execute theinstructions to cause the one or more processors to perform a set ofacts, the acts comprising, identifying a source object of the storagecontent objects; identifying at least one derivative object that hasbeen generated based on one or more properties of the source object;determining a set of one or more candidate eviction objects from thestorage content objects wherein at least one of the candidate evictionobjects is the at least one derivative object; analyzing the sourceobject based at least in part on a first set of criteria pertaining tothe source object; analyzing the at least one derivative object based atleast in part on a second set of criteria pertaining to the at least onederivative object; determining, based at least in part on the analysisof the derivative object, one or more object management commandsassociated with the derivative object; and initiating execution of atleast one of the object management commands.
 20. The system of claim 19,wherein the second set of criteria pertaining to the at least onederivative object is a cost to regenerate the derivative object from thesource object.